Members:
Yehor Furtsev
Piotr Kaczmarek
Ximeng Liao
Hugo Vaartjes
Student numbers:
import pandas as pd
import os
import seaborn as sns
import plotly.express as px
import plotly.graph_objects as go
import plotly.subplots as sp
import plotly.io as pio
from plotly.subplots import make_subplots
from scipy import stats
import seaborn as sns
import numpy as np
import geopandas as gpd
import matplotlib.pyplot as plt
from matplotlib.pyplot import figure
import math
import scipy
pio.renderers.default='notebook'
It has been decided to dedicate the project to analysing dutch roads with focus on covid lockdowns and greenhouse emissions.
The objective of the research is as follows:
Obtaining insights into the potential difference of greenhouse gases emitted by motor vehicle drivers on Dutch roads into the atmosphere caused by Covid-19 lockdowns 2019-2021.
This leads the the research question:
What was the effect of COVID-19 on emissions of motor vehicles on Dutch roads?
To specify the methods by wich the answer to main research question will be tackled it has been decided to divide it in sub-questions:
Sub-question:
- How many COVID-19 cases were recorded in total per day during the entire pandemic?
- What government measures showed most impact on the increase/decrease in COVID-19 cases per day?
In order to visualize the total number of COVID-19 cases per day, some data cleaning had to be done. The data sets provided by the RIVM counted the number of cases recorder per municipality, per day. Hence, some operations in Excel were done to add all cases per day per municipality in order to obtain the total nation-wide amouont of cases per day.
After the operations, the data file was implemented and visualized using geopandas and plotly.express
#Geting current directory
cwd = os.getcwd()
# extract the data from the .csv file
file_path = (str(cwd) + '/Data /' + 'Covid_Cases_Per_day.csv')
df = pd.read_csv(file_path, delimiter = ';')
# show the graph, sorted by year
fig = px.line(df, x='Date', y='Cases', color='Year')
# implement the slider system
fig.update_layout(xaxis=dict(rangeselector=dict(buttons=list([dict(count=1,
step="day", stepmode="backward"),])), rangeslider=dict(visible=True),))
fig.show()
The graph above shows the total number of cases per day. Additionally, using the slider system, the viewer can focus on a smaller time frame, to see the graph in more detail.
From the RIVM website, the exact dates of implementation of government measures were found. These have been implemented and indicated in the graph.
# Adding all annotations corresponding to the different government measures
fig.add_annotation(x='24/03/2020', y=749, ax = '19/05/2020', ay = 42000, text = 'Intelligent Lockdown',
xref='x', yref='y', axref='x', ayref='y')
fig.add_annotation(x='23/10/2020', y=9972, text = 'Partial Lockdown', ax='03/07/2020', ay = 21000,
xref='x', yref='y', axref='x', ayref='y')
fig.add_annotation(x='15/12/2020', y=11166, text = 'Full Lockdown', ax = '06/11/2020', ay=24000,
xref='x', yref='y', axref='x', ayref='y')
fig.add_annotation(x='06/01/2021', y=7087, text='First Vaccination', ax = '30/12/2020', ay = 42000,
xref='x', yref='y', axref='x', ayref='y')
fig.add_annotation(x='23/01/2021', y=4881, text = 'Initiation Curfew', ax = '12/03/2021', ay = 32000,
xref='x', yref='y', axref='x', ayref='y')
fig.add_annotation(x='19/12/2021', y=13255, text='Reinstallation Lockdown', ax = '01/11/2021', ay=42000,
xref='x', yref='y', axref='x', ayref='y')
fig.add_annotation(x='26/01/2022', y=67000, text = 'Lockdown Lift (Omicron Variant)', ax = '08/09/2021', ay = 82000,
xref='x', yref='y', axref='x', ayref='y')
# implement the slider system
fig.update_layout(xaxis=dict(rangeselector=dict(buttons=list([dict(count=1,
step="day", stepmode="backward"),])), rangeslider=dict(visible=True),))
fig.show()
The main government measures can be seen in the graph above.
It can be seen in the beginning of the pandemic, the number of cases per day were relatively low. The first three lockdowns in 2020 showed significant effects in the decrease of the number of cases per day.
In 2021, the first vaccination was made. This, together with the curfew, ensured that the cases per day remained under control. At the end of 2021, a big wave of COVID-19 cases was re-surfacing. This was due to the newly found Omicron variant.
In 2022, the cases per day skyrocketed, due to the highly contageous omicron variant. Government measures had little to no effect at this point. The COVID-19 virus was now considered as a 'type of flu' instead of a dangerous virus.
Subquestions that will be answered in this section:
- What is the impact of Covid-19 on the average movements of Dutch residence?
- Which area of Netherlands would have the residence that is more willing to travel?
- Which area whose number of average movements are influenced by Covid-19 most?
In order to uderstand the average movements distriution per area in Netherlands, so this section, the geometry information of Netherlands has been extracted and cleaned so that it can be used by package geopandas. Moreover, this would help figure out the area where the people's trips are influenced by Covid19 most. For the more detailed data, you can check notebook Simmon's Visualization.
#Read the geo information of Netherlands
lsoas = gpd.read_file(str(cwd) + '/Data /' + 'gadm41_NLD_1.shp')
df1=lsoas.drop(index=[5,12])#This part I drop the water body area of Netherlands where there is no data available
#Plot the map of Netherlands
fig,ax=plt.subplots(1,2,figsize=(8,8))
lsoas.plot(ax=ax[0])
df1.plot(ax=ax[1],color='r',edgecolor='w')#exlude the waterbody areas
a=lsoas.iloc[13:14,:]
a.plot(color='yellow',ax=ax[0])
b=lsoas.loc[[5,12]]
b.plot(ax=ax[0],color='green')#this is the water areas in green
<AxesSubplot:>
The right red map is the one that we gonna use in visualization
This section we manage the dataset obtained from CBS to show the number of average movements.
#Load the orginal dataset of average movements
data=pd.read_csv(str(cwd) + '/Data /' + '/Movements per person per year.csv')
##################
#Data cleaning
#The index of dataframe
a=['year',
'Total',
'Passenger car (driver)',
'Passenger car (passenger)',
'Train',
'Bus/tram/metro',
'Bicycle',
'To walk',
'Other mode of transport']
#Define a function to obtain data for each region
def Obtain(b):
y=data.loc[[0,2,3,4,5,6,7,8,9],[i for i in data.columns[b:b+4]]]
y=y.rename(columns={y.columns[1]:y.columns[0], y.columns[2]:y.columns[0], y.columns[3]:y.columns[0]})
y.index=pd.Series(a)
# y=y.reindex(index=a,columns=[y.columns[0]]*4)
return y
#For example, obtain data for the whole netherlands
test=Obtain(2)
#Then extract the data we want to use and combine them
for i in range(6,54,4):
m=Obtain(i)
test=test.join(m)
#Obtain data structure that can be used by plotly.express
u=test.iloc[2:,:]
u.columns=pd.Series([i for i in test.iloc[0,:]])#derive the year as the index
u
b=u.stack().reset_index()
#Change the column name into what we want
b.columns=pd.Series(['type','year','number of'])
#Extract manually the province in order we need in plotly.express from the dataframe
Province=[]
for i in range(7):
for j in test.columns:
Province.append(j)
#Create a new column that represents the regions
b['Region']=Province
#Change the data type into float that can be compared or calucalated
b['number of']=pd.to_numeric(b['number of'],errors='coerce')
b=b.fillna(0)
#plot the bar grap for each region with animation in regions
px.bar(b,x='year',y='number of',animation_frame='Region',color='type',title='average trips per person per year')
An obvious drop for the average trips per person per region in Netherlands can be found in 2020 when the serious lockdown regulation was published by government.
#Now derive the dataframe used for shapefile(2020 only, see the distribution of trips in Netherlands during COVID 19)
u=test.iloc[1:,:]
u.columns=pd.Series([i for i in test.iloc[0,:]])#derive the year as the index
u
b=u.stack().reset_index()
b.columns=pd.Series(['type','year','number of'])
b
Province=[]
for i in range(8):
for j in test.columns:
Province.append(j)
b['Region']=Province
#Change the index of regions into Dutch to match the orginal SHP file
Geo=b.set_index('Region').rename(index={'North Brabant':'Noord-Brabant','North Holland':'Noord-Holland','Zealand':'Zeeland'})
#Change the data type from str to int that is readable by python
Geo['number of']=Geo['number of'].astype('int')
Geo=Geo.reset_index()
#Plot the data per region for each year on the map to see the distribution
year=[2018,2019,2020,2021]
f,ax=plt.subplots(4,1,figsize=(50,60))
for i in range(4):
G=Geo[(Geo['type']=='Total')&(Geo['year']==str(year[i]))&(Geo['Region']!='The Netherlands')].set_index('Region')
DATA=df1.join(G, on='NAME_1')
DATA.plot(column='number of',scheme='equal_interval', k=6, alpha=1,
edgecolor='w', linewidth=1,legend=True,ax=ax[i])
ax[i].set_title(f'Distribution of average Trips of residence per year in Netherlands during COVID19({year[i]})')
The detailed map of Netherlands including the name of each province is shown below. 
We display the map with distribution of average trips per person of each region of Netherlands from 2018 to 2021 to see if some regions are affected by the COVID 19 heavily (see Figures above). The Geo and shapefile data of Netherlands are extracted from gadm.org and they are also cleaned before being applied to plot.
Based on the maps, we found the regions Drenthe, Overijssel and Gelderland(yellow regions in map) are the areas where people are more willing to move around compared with people from other regions. Drenthe is the area where the residence's trips are influenced by the COVID 19 lockdown most as the value range that they belong to change from 1010.83-1027 to 866-883.33(as well as color representation changing from yellow to green), which means a sort of downgrading. But the average trips in Friesland is interesting to be found that it increased a bit during pandemic.
Goal of this section is to establish an answer to the sub-question:
Which type of the vehicles on the Dutch roads which has not been > experiencing reduction in kilometers traveled during 2020-2021 period?
In order to tackle abovmentioned the initial data search should be conducted. At ’opendata.cbs.nl’ data set has been found that contains the intended data to build graphs which will help to achieve the research objective by answering the subquestion. The data set as
a raw file was not ready to be used in python so we uploaded CSV to Microsoft Excel and updated a couple of names and roads. The received file was ready to be transferred in to the data-set.
# Extracting data file from the folder
file_path = str(cwd) + '/Data /' + 'Verkeersprestaties_motorvoertuigen__kilometers__voertuigsoort__grondgebied_12102022_120827.csv'""
df = pd.read_csv(file_path, sep=';')
# Grouping data file by Vehicle types
grouped = df.groupby(df.Vehicle_types)
Dividing the main dataframe to plot data more easily
df_t_m_v = grouped.get_group("Total motor vehicles")
df_p_c = grouped.get_group("Passenger car")
df_d_v = grouped.get_group("delivery van")
df_t = grouped.get_group("Truck (excl. tractor for trailer)")
df_t_s = grouped.get_group("Tractor for semi-trailer")
df_s_v = grouped.get_group("Special vehicle")
df_b = grouped.get_group("Bus")
In the data there are several types of viecles:
The resulted figure summarize the anticipated drop in num of users that were using their vehicles to travel. On the figure it is hard to see over overvelming trend from Passenger cars that some type of viecles were experiencing rise in number of kilometers driven.
#Creating frame for all of the plots on the same
go.Line(name='Verkeersprestaties_motorvoertuigen__kilometers__voertuigsoort__grondgebied_12102022_120827.csv')
fig = go.Figure()
fig.add_trace(go.Scatter(
x=df_t_m_v.loc[:,'Periods'],
y=df_t_m_v.loc[:,'Totaal_kilometers_in_Nederland'],
name="Total motor vehicles" # this sets its legend entry
))
fig.add_trace(go.Scatter(
x=df_p_c.loc[:,'Periods'],
y=df_p_c.loc[:,'Totaal_kilometers_in_Nederland'],
name="Passenger car"
))
fig.add_trace(go.Scatter(
x=df_d_v.loc[:,'Periods'],
y=df_d_v.loc[:,'Totaal_kilometers_in_Nederland'],
name="Delivery van"
))
fig.add_trace(go.Scatter(
x=df_t.loc[:,'Periods'],
y=df_t.loc[:,'Totaal_kilometers_in_Nederland'],
name="Trucks (excl. Trucks for trailer)"
))
fig.add_trace(go.Scatter(
x=df_t_s.loc[:,'Periods'],
y=df_t_s.loc[:,'Totaal_kilometers_in_Nederland'],
name="Trucks for semi-trailer"
))
fig.add_trace(go.Scatter(
x=df_s_v.loc[:,'Periods'],
y=df_s_v.loc[:,'Totaal_kilometers_in_Nederland'],
name="Special vehicle"
))
fig.add_trace(go.Scatter(
x=df_b.loc[:, 'Periods'],
y=df_b.loc[:, 'Totaal_kilometers_in_Nederland'],
name="Bus"
))
fig.update_layout(
title="Traffic performance motor vehicles; kilometres, vehicle type",
xaxis_title="Years",
yaxis_title="Mln km",
legend_title="Period",
font=dict(
family="Courier New, monospace",
size=18,
color="RebeccaPurple"
)
)
fig.show()
It is worth to mention that it has been found that the only major vehicle type that has not experienced drop where vehicle were trucks for semi-trailer. This effect may be due to high delivery quantities for consumers that were locked down.
go.Line(name='Trucks for semi-trailer')
fig2 = go.Figure()
fig2.add_trace(go.Scatter(
x=df_t_s.loc[:,'Periods'],
y=df_t_s.loc[:,'Totaal_kilometers_in_Nederland'],
name="Trucks for semi-trailer"
))
fig2.update_layout(
title="Traffic performance of Trucks for semi-trailer; kilometres",
xaxis_title="Years",
yaxis_title="Mln km",
legend_title="Period",
font=dict(
family="Courier New, monospace",
size=18,
color="RebeccaPurple"
)
)
Sub-Question:
How are the emissions related to the average travel distance of people in the Netherlands?
df_road = pd.read_csv(str(cwd) + '/Data /' + 'Road emissions.csv', delimiter=',')
df_road
| Periods | Carbon dioxide (CO2) | Carbon monoxide (CO) | PM10 (fine dust) | |
|---|---|---|---|---|
| 0 | 2005 | 30200 | 393.4 | 8.6 |
| 1 | 2010 | 29600 | 370.4 | 6.3 |
| 2 | 2015 | 28100 | 317.0 | 4.5 |
| 3 | 2016 | 28900 | 307.2 | 4.4 |
| 4 | 2017 | 29200 | 308.1 | 4.3 |
| 5 | 2018 | 29400 | 286.1 | 4.2 |
| 6 | 2019 | 29300 | 268.4 | 4.0 |
| 7 | 2020 | 26100 | 224.1 | 3.5 |
| 8 | 2021 | 26600 | 217.3 | 3.6 |
The dataset used for emissions was provided on cbs.nl, featuring emissions measured on Dutch territory in million kg of CO2, CO and PM10. The data was collected on various transport modes in the Netherlands spanning from 2005 until 2021, with a yearly interval from 2015. The values applicable to the research objective include total road traffic emissions, since the focus of the objective is on vehicles on Dutch roads. The dataset was last updated in September of 2022.
# The dataset is imported
df_road = pd.read_csv(str(cwd) + '/Data /' + 'Road emissions.csv', delimiter=',')
df_road
# Outlier periods are removed
df_roadlimited = df_road.drop(0)
df_roadlimited = df_roadlimited.drop(1)
df_roadlimited
# Bar chart
figco2bar = px.bar(df_roadlimited, x='Periods', y='Carbon dioxide (CO2)',
title='Total road traffic emissions (Million kg)')
figco2bar.update_yaxes(range=(25000,31000))
figco2bar.update_traces(marker_color='red')
figco2bar.show()
The CO2 emission data was plotted as shown in Figure above in the form of a line and bar chart. CO2 was chosen from the data since it is the most commonly used measure when discussing climate change and human pollution effects on the environment. There is a clear drop in CO2 emissions from 2019 to 2020.
# Bar chart
figcobar = px.bar(df_roadlimited, x='Periods', y='Carbon monoxide (CO)',
title='Total road traffic emissions (Million kg)')
figcobar.update_yaxes(range=(200,400))
figcobar.update_traces(marker_color='green')
figcobar.show()
# Bar chart
figpm10bar = px.bar(df_roadlimited, x='Periods', y='PM10 (fine dust)',
title='Total road traffic emissions (Million kg)')
figpm10bar.update_yaxes(range=(3,5))
figpm10bar.update_traces(marker_color='blue')
figpm10bar.show()
# For as many traces that exist per Express figure, get the traces from each plot and store them in an array.
# This is essentially breaking down the Express fig into it's traces
figco2line_traces = []
figco2bar_traces = []
for trace in range(len(figco2line["data"])):
figco2line_traces.append(figco2line["data"][trace])
for trace in range(len(figco2bar["data"])):
figco2bar_traces.append(figco2bar["data"][trace])
#Create a 1x2 subplot
this_figure = sp.make_subplots(rows=1, cols=2)
# Get the Express fig broken down as traces and add the traces to the proper plot within in the subplot
for traces in figco2line_traces:
this_figure.append_trace(traces, row=1, col=1)
for traces in figco2bar_traces:
this_figure.append_trace(traces, row=1, col=2)
# this_figure.add_annotation(text="CO2 (Million kg)")
this_figure.update_layout(title_text="Total road traffic emissions (Million kg)")
this_figure.show()
Combination with cars mobility data
df_use = pd.read_csv(str(cwd) + '/Data /' + 'Road usage.csv', delimiter=',')
df_use
| Period | Average number of ride per day | Average km travelled per ride | Average distance travelled per day | |
|---|---|---|---|---|
| 0 | 2017 | 2.81 | 10.38 | 29.1678 |
| 1 | 2018 | 2.89 | 10.37 | 29.9693 |
| 2 | 2019 | 2.82 | 10.44 | 29.4408 |
| 3 | 2020 | 2.34 | 8.77 | 20.5218 |
fig_com = make_subplots(specs=[[{"secondary_y": True}]])
fig_com.add_trace(
go.Bar(x = df_roadlimited["Periods"], y=df_roadlimited["Carbon dioxide (CO2)"], name="Yearly CO2 Emissions", marker_color = 'blue'),
secondary_y=False)
fig_com.add_trace(
go.Scatter(x=df_use["Period"], y=df_use["Average distance travelled per day"], name="Average distance travelled per day"),
secondary_y=True)
fig_com.update_layout(yaxis1 = dict(range=[25000,30000]))
fig_com.update_layout(title_text = "Distance travelled compared to CO2 emissions")
fig_com.update_layout(yaxis1_title="Million Kilograms")
fig_com.update_layout(yaxis2_title="Kilometers")
fig_com.update_layout(xaxis_title="Year")
fig_com
When overlaying the data of yearly CO2 emissions with the average distance travelled on Dutch roads per person per day, it is clear that there is an observable correlation between the two trends. This is to be expected, since a decrease in kilometers travelled must mean a lower vehicle activity and thus less engine operation times, which results in less emissions from the combustion of the vehicles engines. This is portrayed in the graph above, where there is a significant drop from 29.4 km to 20.5 km travelled per person per day between 2019 and 2020. This corresponds to a drop in emissions in the same time frame 29.3 million tons to 26.1 million tons of CO2.
if len(df_roadlimited) != 4:
df_roadlimited = df_roadlimited.drop(2)
df_roadlimited = df_roadlimited.drop(3)
df_roadlimited = df_roadlimited.drop(8)
else:
pass
[r,p]=stats.pearsonr(df_use["Average distance travelled per day"], df_roadlimited["Carbon dioxide (CO2)"])
print('r = ',r)
print('p = ', p)
r = 0.9996785108106682 p = 0.0003214891893318361
From the overlapping regions (2017 - 2020) the r value can be computed, signifying the linear correlation coefficient between the average distance travelled and the CO2 emissions data. The r value is ~1 for these sets of data, therefore it can be concluded that there is a positive correlation between them. This correlation can be explained with the fact that the motor vehicles used for the travel produce CO2 directly when operating (aside from vehciles which do not use combustion engines), thus there is a clear causality which can be observed. This is further suppored to by the p value being near zero, since this represents the probability that there is no correlation between the data sets.
The results from the dataset from the daily COVID-19 cases have been combined with the dataset from the yearly emissions in the Netherlands.
#Geting current directory
cwd = os.getcwd()
# extract the data from the .csv file
file_path = (str(cwd) + '/Data /' + 'Covid Cases per Day incl 2019.csv')
df = pd.read_csv(file_path, delimiter = ';').iloc[0:1096]
df_road = pd.read_csv(str(cwd) + '/Data /' +'Road Emissions per date.csv', delimiter=';').iloc[0:1096]
fig_com = make_subplots(specs=[[{"secondary_y": True}]])
fig_com.add_trace(
go.Bar(x = df_road["Date"], y=df_road["Carbon dioxide (CO2)"], name="kg CO2", marker_color = 'black'),
secondary_y=False)
fig_com.add_trace(
go.Scatter(x=df["Date"], y=df["Cases"], name="Cases"),
secondary_y=True)
fig_com.update_layout(yaxis1 = dict(range=[25000,30000]))
fig_com.update_layout(title_text = "Covid Cases compared to kg CO2 emissions")
fig_com.update_layout(yaxis1_title="kg Carbon Dioxide emissions (CO2)")
fig_com.update_layout(yaxis2_title="Covid Cases")
fig_com2 = make_subplots(specs=[[{"secondary_y": True}]])
fig_com2.add_trace(
go.Bar(x=df_road["Date"], y=df_road["Carbon monoxide (CO)"], name="kg CO", marker_color = 'black'),
secondary_y=False)
fig_com2.add_trace(
go.Scatter(x=df["Date"], y=df["Cases"], name="Cases"),
secondary_y=True)
fig_com2.update_layout(yaxis1 = dict(range=[200,280]))
fig_com2.update_layout(title_text = "Covid Cases compared to kg CO emissions")
fig_com2.update_layout(yaxis1_title="kg Carbon Monoxide emissions (CO)")
fig_com2.update_layout(yaxis2_title="Covid Cases")
fig_com3 = make_subplots(specs=[[{"secondary_y": True}]])
fig_com3.add_trace(
go.Bar(x=df_road["Date"], y=df_road["PM10 (fine dust)"], name="*10 kg PM10", marker_color = 'black'),
secondary_y=False)
fig_com3.add_trace(
go.Scatter(x=df["Date"], y=df["Cases"], name="Cases"),
secondary_y=True)
fig_com3.update_layout(yaxis1 = dict(range=[32,47]))
fig_com3.update_layout(title_text = "Covid Cases compared to PM10 (fine dust)")
fig_com3.update_layout(yaxis1_title="*10 kg PM10 (fine dust)")
fig_com3.update_layout(yaxis2_title="Covid Cases")
fig_com.show()
fig_com2.show()
fig_com3.show()
The three graphs above show the correlation between the increase/decrease of the total number of COVID-19 cases per day and the yearly emissions of the CO2, CO and fine dust into the Dutch atmosphere. The time-frame was chosen as 2019-2021.
As 2022 is not finished yet, the emissions data are not yet complete. Additionally, there were no COVID cases in 2019, explaining the flat line in the beginning of the linegraph. However, if 2019 was excluded from the dataset, it could not be compared with the emissions for 2019. Hence, adding 2019 to the correlation is necassary to obtain the most logical results. This way, it can be seen that the emissions for all three domains dropped significantly once COVID was introduced to the world in 2020.
After the combined graphs have been shown, the two datasets' correlation can be computed, using the regression formula. This is executed below.
co2_cases = stats.pearsonr(df["Cases"], df_road["Carbon dioxide (CO2)"])
co_cases = stats.pearsonr(df_road["Carbon monoxide (CO)"], df["Cases"])
dust_cases = stats.pearsonr(df_road["PM10 (fine dust)"], df["Cases"])
print("The regression between kg CO2 emission and COVID-19 cases is:", co2_cases )
print("\n")
print("The regression between kg CO emission and COVID-19 cases is:", co_cases)
print("\n")
print("The regression between *10kg PM10 dust emissions and COVID-19 cases is:", dust_cases)
The regression between kg CO2 emission and COVID-19 cases is: (-0.38873315367394823, 7.434267464969631e-41) The regression between kg CO emission and COVID-19 cases is: (-0.494322142501995, 1.3401199675369235e-68) The regression between *10kg PM10 dust emissions and COVID-19 cases is: (-0.4125576025568201, 2.7689958947318336e-46)
From the regressions, it can be seen that all three correlations are negative. This can also be confirmed when looking at the graphs. It means that with an increase of COVID cases, the total emissions decrease. This was due to the fact that the government introduced several measures to keep people in their homes. This resulted in less traffic on the roads, leading to less emissions.
It can also be seen that the r-value for all three correlations is quite low (in absolute terms). With a strong correlation, the r-value is usally between 0.8 and 1.0 (positive correlation) or -0.8 and -1.0 (negative correlation). The fact that the correlations are quite weak can be explained by two factors:
The behaviour of the Dutch people. In the first year of COVID, people tended to respect the COVID measures as much as they could. After a year, the situation stabalized, and people started going outdoors more and more, leading to more emissions.
Due to new variants, especially the Omicron variant, the total number of cases increased dramatically, while emissions remained more or less stable. This made the correlation less strong as well.
With all the data gathered and information compiled together it is time to rewind how the answers to the sub questions were achived.
First, the answers to all of the subquestions were achieved to help answer the main research question. In the COVID-19 data required for visualization part of the report all required information regarding covid 19 cases and measures were obtained. In the Geographic data needed for visualization chapter, the driving data of the dutch drivers were analysed with respect to their geographical positions. It could be seen in that chapter the impact of the Covid-19 restrictions on the driver's behaviour and how it has been reduced over a period of restrictions. In the chapter Kilometres driven by the types of vehicles the important differentiation between types of vehicles is made. The vehicles which have not been experiencing a downtrend have been found. Lastly, the Emission data section presented data on how the emissions related to the average travel distance of people in the Netherlands.
Thus the objective of the finding the effect of Covid-19 cases on the CO2 emmisions made by the vehicles on the Duch roads is negative proportional. With higher numers of cases the CO2 emmisions where lower.